Korean Erroneous Sentence Classification With Integrated Eojeol Embedding
نویسندگان
چکیده
This paper attempts to analyze the Korean sentence classification system. Sentence is task of classifying an input based on predefined categories. However, spelling or space error contained in causes problems morphological analysis and tokenization. proposes a novel approach Integrated Eojeol (Korean syntactic word separated by space) Embedding reduce effect poorly analyzed morphemes classification. The also two noise insertion methods that further improve performance. Our evaluation results indicate applying proposed existing classifiers, accuracy erroneous sentences increased 8% 15%.
منابع مشابه
Experiments with Sentence Classification
We present a set of experiments involving sentence classification, addressing issues of representation and feature selection, and we compare our findings with similar results from work on the more general text classification task. The domain of our investigation is an email-based help-desk corpus. Our investigations compare the use of various popular classification algorithms with various popul...
متن کاملIntegrated sentence generation with charts
Integrating surface realization and the generation of referring expressions (REs) into a single algorithm can improve the quality of the generated sentences. Existing algorithms for doing this, such as SPUD and CRISP, are search-based and can be slow or incomplete. We offer a chart-based algorithm for integrated sentence generation which supports efficient search through chart pruning.
متن کاملNumeric-attribute-powered Sentence Embedding
Modern embedding methods focus only on the words in the text. The word or sentence embeddings are trained to represent the semantic meaning of the raw texts. However, many quantified attributes associated with the text, such as numeric attributes associated with Yelp review text, are ignored in the vector representation learning process. Those quantified numeric attributes can provide important...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملEnhancing Sentence Relation Modeling with Auxiliary Character-level Embedding
Neural network based approaches for sentence relation modeling automatically generate hidden matching features from raw sentence pairs. However, the quality of matching feature representation may not be satisfied due to complex semantic relations such as entailment or contradiction. To address this challenge, we propose a new deep neural network architecture that jointly leverage pre-trained wo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3085864